Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk
نویسندگان
چکیده
Amazon's Mechanical Turk service has been successfully applied to many natural language processing tasks. However, the task of named entity recognition presents unique challenges. In a large annotation task involving over 20,000 emails, we demonstrate that a compet itive bonus system and interannotator agree ment can be used to improve the quality of named entity annotations from Mechanical Turk. We also build several statistical named entity recognition models trained with these annotations, which compare favorably to sim ilar models trained on expert annotations.
منابع مشابه
PAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کاملAnnotating Named Entities in Twitter Data with Crowdsourcing
We describe our experience using both Amazon Mechanical Turk (MTurk) and CrowdFlower to collect simple named entity annotations for Twitter status updates. Unlike most genres that have traditionally been the focus of named entity experiments, Twitter is far more informal and abbreviated. The collected annotations and annotation techniques will provide a first step towards the full study of name...
متن کاملPreliminary Experience with Amazon’s Mechanical Turk for Annotating Medical Named Entities
Amazon’s Mechanical Turk (MTurk) service is becoming increasingly popular in Natural Language Processing (NLP) research. In this paper, we report our findings in using MTurk to annotate medical text extracted from clinical trial descriptions with three entity types: medical condition, medication, and laboratory test. We compared MTurk annotations with a gold standard manually created by a domai...
متن کاملPreliminary Experiments with Amazon's Mechanical Turk for Annotating Medical Named Entities
Amazon’s Mechanical Turk (MTurk) service is becoming increasingly popular in Natural Language Processing (NLP) research. In this paper, we report our findings in using MTurk to annotate medical text extracted from clinical trial descriptions with three entity types: medical condition, medication, and laboratory test. We compared MTurk annotations with a gold standard manually created by a domai...
متن کاملRobust Logistic Regression using Shift Parameters
Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels. In this paper, we present a robust extension of logistic regression that incorporates the possibility of mislabelling directly into the objective. This model can be trai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010